Since the eighteenth century, broad-scale patterns of diversity called the attention of naturalists. Recognizing that tropical regions have higher species richness relative to temperate areas, Alexander von Humboldt was the first one to propose it to emerge from climatic differences (Hawkins, 2001). This ubiquitous pattern has since then been known as the Latitudinal Diversity Gradient (LDG) and, although the global distribution of biodiversity is indeed far more complex than a simple unidirectional gradient (Hawkins & Felizola Diniz-Filho, 2004), the difference in species richness between temperate and tropical regions tends to capture the most evident facet of the distribution of life on Earth: its geographic heterogeneity.
Early explanations for the LDG in the 1950s and 1960s followed von Humboldt’s tradition and focused on the strong correlations observed between diversity (i.e., species richness) and components of current environmental variation—especially combinations of temperature and precipitation (Brown, 2014; Hawkins et al., 2003; O’Brien, 2006; Pianka, 1966; Simpson, 1964). These high correlations suggested a causal explanation, and spurred the development of hypotheses that aimed to identify the mechanisms affecting species distributions and hence driving geographical patterns (Currie et al., 2004). Although these diversity-environment correlations suggested “pure ecological explanations” that involved population-level processes tied to dispersal and aggregation of tropical organisms, it quickly became clear that deep-time evolutionary processes should also be taken into account to explain the LDG (Ricklefs, 2004; Rohde, 1992). In fact, as early as 1937, Theodosius Dobzhansky had proposed that diversity gradients should be explained by an interaction between ecological and evolutionary mechanisms, in which evolution would drive the dimensions of the niche—the set of biotic and abiotic factors that allow a species to exist indefinitely—that would allow different patterns of niche packing throughout environmental gradients. Today, it is consensus that the LDG should be explained not only by current climatic factors, but also by the long-term dynamics of such climatic factors and by events happening throughout the evolution of the species (Fine, 2015).
Today we will learn basic tools in R for visualizing species distributions, build geographical ranges, testing drivers of gradients of biodiversity under different approaches.
You will need three datasets, that will be provided for you:
Species occurrence data points – live.oaks.txt
Species geograhical ranges – Furnarii_ranges_geo.shp
Environmental predictors – bio1.bil and bio12.bil
1 Set up your data and your working directory
Set up a working directory and put the data files in that directory. Tell R that this is the directory you will be using, and read in your data:
You can also go ahead and run the next lines if you don’t want to copy a paste the links provided above.
Code
main.dir <-getwd() # Will get the working directory# Print main directorymain.dir# create a Data folderdir.create("Data")urls <-"https://www.dropbox.com/s/36m77nus3taywjp/Archive.zip?dl=1"# Name of the file to downloaddownload.file(url = urls, file.path(main.dir, "Data/Archive.zip"), mode ="wb") # download the file in a specific folderunzip("Data/Archive.zip", exdir ="Data/")
To do this laboratory you will need to have a set of R packages. Install the following packages:
This command, will, first, check if you already the packages installed, then if a package is not installed in your computer, will install it.
IMPORTANT
When installing multiple packages, please pay attention to the messages in your R console. In some cases, R will ask you if you want to install the source version. If that is the case, just type n and hit enter.
Load installed packages:
Code
library(scico)library(rnaturalearth)
The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
(status 2 uses the sf package in place of rgdal)
Support for Spatial objects (`sp`) will be deprecated in {rnaturalearth} and will be removed in a future release of the package. Please use `sf` objects with {rnaturalearth}. For example: `ne_download(returnclass = 'sf')`
Code
library(smoothr)
Attaching package: 'smoothr'
The following object is masked from 'package:stats':
smooth
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(terra)
terra 1.7.39
Attaching package: 'terra'
The following object is masked from 'package:tidyr':
extract
The following object is masked from 'package:smoothr':
densify
Code
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
Double-check your working directory.
Function getwd()
You can use the function getwd() to get the current working directory.
2 From point occurrences to range maps
We will start setting the geographical extent of our study area and to do that we will use spatial data from the package {rnaturalearth}.
Code
sf_use_s2(FALSE)
Spherical geometry (s2) switched off
Code
# Get world map from the {rnaturalearthdata} packageworldMap <-ne_countries(scale ="medium", type ="countries", returnclass ="sf")# cCountry subset - The United States and MexicoNApoly <- worldMap %>%filter(admin =="United States of America"| admin =="Mexico")# trim to study arealimsNA <-st_buffer(NApoly, dist =1) %>%st_bbox()
Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
dist is assumed to be in decimal degrees (arc_degrees).
Load species occurrences data points. We will use occurrences from Live oaks, that were obtained from iDigBio between 20 and 24 July 2018 by Jeannine Cavender-Bares. Notice that these occurrence data points were visually examined and any localities that were outside the known range of the species, or in unrealistic locations (e.g., water bodies, crop fields) were discarded.
Rows: 672 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Species
dbl (2): Longitude, Latitude
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
oaks_occ %>%count(Species) # check how many species and how many observations per species
# Visualizeoaks_occ %>%count(Species) %>%ggplot(aes(x = n)) +geom_histogram() +xlab("Number of observations")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Transform data.frame to spatial data.frame
Code
# to sf object, specifying variables with coordinates and projectionoaks_occ_sf <-st_as_sf(oaks_occ, coords =c("Longitude", "Latitude"), crs =4326) %>%#group_by(species) %>%st_cast("MULTIPOINT") %>%group_by(Species) %>%summarize()
although coordinates are longitude/latitude, st_union assumes that they are
planar
In this section we will learn how to create “simple” range maps based on geometry (e.g. minimum convex polygons, etc.), without considering environmental variables (e.g., ENMs or SDMs). Note that these range maps are geographical abstractions of the species ranges. In other words, a species range is the area where a particular species can be found during its lifetime. Species range includes areas where individuals or communities can migrate or hibernate
We will explore two alternative, one based on simple convex hull and the other is the smoothed convex hull
Please explain the results. How do you feel about that?
Answer: Out of the seven species, three have very large geographical ranges (Q. virginiana, Q. oleoides, and Q. fusiformis). The other four have small geographical ranges, of which Q. brandegei has the smallest range and is restricted to Baja California Sur, Mexico.
Until here we have explored how to plot, clean and build species geographical ranges using occurrences. Now we will use species geographical ranges of the largest continental endemic radiation (Furnariides) to explore the geographical gradients of species diversity.
3 Diversity gradients
3.1 Prepare data and mapping
The geographical ranges correspond to the Infraorder Furnariides (Aves). This data is available thorough BirdLife International and you can use any other group available on IUCN or BIEN (for plants in the Americas). In any case, you first need to download the polygons in shapefile format.
To load the Furnariides geographical ranges, we will use the function st_read() from the package {sf}. Please read the message printed on your console and try to understand the data.
What variables are present in the spatial polygon object? and How many species?
Answer: There are 652 species in this spatial object. The variables included are: Perimeter, Area, AreaKM2, RD, DR, BodyMass, DietInv, DietFruit, DietSeed, StratGroun, Understory, Midhigh, Canopy, Ages.
Explain the distribution for both species (i.e., Furnarius rufus [blue polygon] and Anabazenops dorsalis [red polygon])Are these species distributed in sympatry or allopatry? Explain the selected distribution pattern.
Answer:F. rufus is widely distributed mostly in the lowlands of South America (Bolivia, Brazil, Paraguay, Uruguay and Argentina). A. dorsalis is has a more restricted distribution mostly in the Andes (Bolivia, Perú, Ecuador and Colombia). Despite these two species belong to the same lineaje (Family Furnariidae) they present an allopatric distribution
3.2 Raster of species richness
Species richness is the number of different species represented in an ecological community, landscape or region. Species richness is simply a count of species, and it does not take into account the abundances of the species or their relative abundance distributions.
Now, let’s create a map that represent the species richness of Furnariides.
First create an empty raster for the Neotropics using the extent of the Furnariides ranges under a spatial resolution of 1º long-lat or 111 km at the equator.
Code
# Create raster ro store richness valuesneo_ras <-rast() # empty rasterext(neo_ras) <-ext(franges) # Set the raster "extent" res(neo_ras) <-1# Set the raster "resolution" neo_ras # print the raster object in the console
values(neo_ras) <-0# assign O values to all pixels in the raster
Plot empty raster for the Neotropics
Code
# transform the sf object to a sp objectSApoly_sp <-as(SApoly, "Spatial") # Plot empty rasterplot(neo_ras)plot(SApoly_sp, add =TRUE) ## overlay SA countries to the SR map
Now using the empty raster we will rasterize the species identities in each cell or pixel. The resulting raster will be the species richness of Furnariides across the Neotropics.
Code
f_sr_raster <- terra::rasterizeGeom(x =vect(franges), # species geographical rangesy = neo_ras, # empty rasterfun ="count"# count number of species per grid cell )# this process can take a while depending on your computer (~45 secs in Jesús's computer), please be patient.
Plot the resulting raster.
Code
plot(f_sr_raster)
Let’s try changing the colors using the package {viridis}
Code
# country subsetApoly <- worldMap %>%filter(continent =="South America"| continent =="North America")# transform the sf object to a sp objectApoly_sp <-as(Apoly, "Spatial") # Plot the informationplot(mask(f_sr_raster, Apoly), # raster of species richnesscol = viridis::turbo(10), # colorszlim =c(min(values(f_sr_raster), max(values(f_sr_raster)))), main ="Furnariides species richness")
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add =TRUE) ## overlay SA countries to the SR map
Or we can try a more fancy way to plot the number of Furnariids’ species. To do that we can use the package {rasterVis} for plotting and the package {RColorBrewer} for selecting color combinations.
Code
library(rasterVis)
Loading required package: lattice
Code
library(RColorBrewer)# First set a thememapTheme <-rasterTheme(region =rev(brewer.pal(11, "Spectral")),layout.widths =list(right.padding =10),axis.line =list(col ="transparent"),tick =list(col ='transparent'))## Now we can plot the rasterp_furna_SR <-levelplot(mask(f_sr_raster, Apoly),maxpixels =1e10,margin =FALSE, main =list('Furnariides \n species richness', col ='darkgray'), par.settings = mapTheme,scales =list(x =list(draw =TRUE),y =list(draw =TRUE)),zlim =c(0, 180))
Warning: [mask] CRS do not match
Code
p_furna_SR
Awesome, right?. Now, please, describe the observed pattern!
Answer: The map shows the classic latitudinal diversity gradient, with a higher accumulation of species close to the equator and a continued decline in the number of species as we approach the South Pole.
3.3 Scale dependency
Now we will explore one of the oldest problems in ecology and evolution, the scale dependency in the data. So to explore this scale dependence, we will rasterize the Furnariides ranges, but using different spatial resolutions from 2º to 6º degrees of long-lat.
Set the empty rasters.
Code
# 2º degreesneo_ras_2dg <-rast()# Set the raster "extent" ext(neo_ras_2dg) <-ext(franges)res(neo_ras_2dg) <-2neo_ras_2dg
Now, rasterize the species richness to the desired pixel size.
Code
# Furnariides at 2º of long-latf_sr_2dg_raster <- terra::rasterizeGeom(x =vect(franges), # species geographical rangesy = neo_ras_2dg, # empty rasterfun ="count"# count number of species per grid cell )# Furnariides at 4º of long-latf_sr_4dg_raster <- terra::rasterizeGeom(x =vect(franges), # species geographical rangesy = neo_ras_4dg, # empty rasterfun ="count"# count number of species per grid cell )# Furnariides at 8º of long-latf_sr_8dg_raster <- terra::rasterizeGeom(x =vect(franges), # species geographical rangesy = neo_ras_8dg, # empty rasterfun ="count"# count number of species per grid cell )
Plot the four maps.
Code
par(mfrow =c (2, 2))## Map at 1 degree grid cellplot(mask(f_sr_raster, Apoly), # raster of species richnesscol = viridis::turbo(10), # colorszlim =c(min(values(f_sr_raster), max(values(f_sr_raster)))), main ="Furnariides species richness 1 degree")
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add =TRUE) ## Map at 2 degrees grid cellplot(mask(f_sr_2dg_raster, Apoly), # raster of species richnesscol = viridis::turbo(10), # colorszlim =c(min(values(f_sr_raster), max(values(f_sr_raster)))), main ="Furnariides species richness 2 degrees")
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add =TRUE) ## Map at 4 degrees grid cellplot(mask(f_sr_4dg_raster, Apoly), # raster of species richnesscol = viridis::turbo(10), # colorszlim =c(min(values(f_sr_raster), max(values(f_sr_raster)))), main ="Furnariides species richness 4 degrees")
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add =TRUE) ## Map at 8 degrees grid cellplot(mask(f_sr_8dg_raster, Apoly), # raster of species richnesscol = viridis::turbo(10), # colorszlim =c(min(values(f_sr_raster), max(values(f_sr_raster)))), main ="Furnariides species richness 8 degrees")
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add =TRUE)
Code
#dev.off()
So, is there an effect of spatial scale?
Answer: Yes, with increasing the pixel size the pattern becomes diffuse and the detection of biogeographic patterns.
Explain the differences between the four maps
Answer: Maps at 1 (~111 km) and 2 degrees are similar, showing a high concentration of species in the Amazon rainforest. With pixels of 4 and 8 degrees, regions that previously had low diversity now appear to contain a high number of species.
How do you feel about that?
Answer: The scale of analysis matters, and we should take care of it before making conclusions about the observed biodiversity patterns.
3.4 Correlative relationships
3.4.1 Species richness as a function of evolutionary history
Let’s try to rasterize another information from the polygon data set. We will use the information in the column RD, this data correspond to the numbers of nodes from the tips to the root of a phylogenetic tree or just root distance, thus, will use the RD to calculate the MRD metric (mean root distance) that measures the evolutionary derivedness of species within an assemblage (Kerr & Currie, 1999) and can be used to determine whether a local fauna is constituted primarily by early-diverged or by recently originated species (Hawkins et al., 2012; Pinto-Ledezma et al., 2017). In other words, high MRD values means that the community (i.e., grid-cell) is composed mostly by recently originated species, whereas low MRD values by early-diverged species.
Rasterize the species’ Root distance to create a map of Mean Root Distance.
Code
f_MRD_raster <- terra::rasterize(vect(franges), # polygons neo_ras, # empty rasterfield ="RD", # Root distancefun = mean # function mean )
Plot the results
Code
plot(f_MRD_raster, main ='Furnariides mean root distance')plot(Apoly_sp, add =TRUE)
Let’s try changing the colors.
Code
## Now we can plot the rasterp_furna_MRD <-levelplot(f_MRD_raster,maxpixels =1e10,margin =FALSE, main =list('Furnariides \n mean root distance', col ='darkgray'), par.settings = mapTheme,scales =list(x =list(draw =TRUE),y =list(draw =TRUE)),zlim =c(0, 25))p_furna_MRD
Based on the description provided above, please describe the MRD pattern
Answer: The map of mean root distance (MRD) shows an inverse pattern when compared with the patterns of species richness. Specifically, areas/pixels with a high number of species present low MRD, and areas/pixels with a low number of species have high MRD values. This might suggest that areas with low number of co-occurring species are mostly composed by recently-evolved species, conversely, areas with high number of species by species that diverged early in the evolutionary history of the clade.
Let’s plot both raster.
Code
par(mfrow =c(1, 2))plot(mask(f_sr_raster, Apoly), col = viridis::plasma(10), main ="Furnariides species richness")
Warning: [mask] CRS do not match
Code
plot(mask(f_MRD_raster, Apoly), col = viridis::plasma(10), main ="Furnariides mean root distance")
Warning: [mask] CRS do not match
Code
#dev.off()
Check if there is a relationship between the species richness and the evolutionary derivedness.
Call:
lm(formula = values(f_sr_raster) ~ values(f_MRD_raster))
Residuals:
Min 1Q Median 3Q Max
-109.581 -13.659 -0.013 13.919 115.372
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 209.46 4.83 43.37 <2e-16 ***
values(f_MRD_raster) -10.54 0.32 -32.95 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.81 on 1644 degrees of freedom
(4729 observations deleted due to missingness)
Multiple R-squared: 0.3977, Adjusted R-squared: 0.3973
F-statistic: 1085 on 1 and 1644 DF, p-value: < 2.2e-16
Code
# get pixel values from raster richnessdata_sr <-as.data.frame(f_sr_raster, xy =TRUE) # get pixel values from raster MRDdata_mrd <-as.data.frame(f_MRD_raster, xy =TRUE)# Combine both datasetsdata_sr_mrd <-left_join(data_sr, data_mrd, by =c("x", "y")) %>%rename(SR = area, MRD = RD) %>%drop_na(MRD)# Plot the associationdata_sr_mrd %>%ggplot(aes(x = MRD, y = SR)) +geom_point(color ="darkgray", size =3, alpha =0.5) +geom_smooth(method ="lm")
`geom_smooth()` using formula = 'y ~ x'
Hmmm. What happened in here? Please answer the next questions.
From the mean root distance map, it is possible to explain the Furnariides diversity gradient? If so, please explain from an evolutionary perspective.
Answer: Species-rich areas present low MRD values, and species-poor areas present high MRD values. MRD is a metric that measures deriveness, and high MRD means that assemblages are composed of recently evolved species. This can also be interpreted as these pixels/areas also present higher speciation rates. On the contrary, areas with low MRD will present low rates of speciation, and the higher species richness in these areas is a product of steady but low rates of speciation.
Save the figures
There are multiple options to save the figures. Jesús particularly like saving his figures in PDF. To save the figures in a pdf file, you can use the following code.
This lines will save your figure in your working directory.
3.4.2 Species richness as a function of environment
Load the environmental variables that correspond to bio1 (Annual Mean Temperature) and bio12 (Annual Precipitation). These data correspond to two variables out of 19 from WorldClim (http://www.worldclim.org/current). We will use these two variables just for educational purposes, rather to make a complete evaluation of the species-environmental relationships.
Code
# Annual Mean Temperaturebio1 <-rast("Data/BioClim/bio1.bil")bio1 <- bio1/10# Annual Precipitationbio12 <-rast("Data/BioClim/bio12.bil")bio12
class : SpatRaster
dimensions : 900, 2160, 1 (nrow, ncol, nlyr)
resolution : 0.1666667, 0.1666667 (x, y)
extent : -180, 180, -60, 90 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : bio12.bil
name : bio12
min value : 0
max value : 9916
Plot the environmental variables
Code
plot(bio1)
Code
plot(bio12)
Ok, the bio1 and bio12 layers are at global scale, so now will need to crop them to the extent of the Neotropics.
par(mfrow =c(1, 2))plot(bio1_neo, main ="Annual Mean Temperature", col =rev(viridis::inferno(10)))plot(bio12_neo, main ="Annual Precipitation", col =rev(viridis::inferno(10)))
Much better!
Obtain the values from bio1, bio12, SR and MRD for each cell or pixel using the coordinates.
Code
# Get environmental data using coordinates from our mapsf_ras_bios <-extract(x =c(bio1_neo, bio12_neo), # environmental variablesy = data_sr_mrd[, c("x", "y")]) %>%# coordinatesrename(MAT = bio1, MAP = bio12)# Combine the informationfdata <-bind_cols(data_sr_mrd, f_ras_bios)head(fdata)
Now make a simple correlation between the Furnariides richness and bio1 and bio12.
Species richness correlated with Temperature
Code
cor.test(fdata$SR, fdata$MAT)
Pearson's product-moment correlation
data: fdata$SR and fdata$MAT
t = 23.101, df = 1644, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4576616 0.5306551
sample estimates:
cor
0.4950313
Species richness correlated with Precipitation
Code
cor.test(fdata$SR, fdata$MAP)
Pearson's product-moment correlation
data: fdata$SR and fdata$MAP
t = 35.034, df = 1644, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.6252291 0.6806113
sample estimates:
cor
0.6537949
And also the linear model…
Code
lmbio1 <-lm(SR ~ MAT, data = fdata)summary(lmbio1)
Call:
lm(formula = SR ~ MAT, data = fdata)
Residuals:
Min 1Q Median 3Q Max
-64.532 -20.247 -4.611 17.341 136.721
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.9588 2.4722 -1.197 0.232
MAT 2.5696 0.1112 23.101 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 28.89 on 1644 degrees of freedom
Multiple R-squared: 0.2451, Adjusted R-squared: 0.2446
F-statistic: 533.6 on 1 and 1644 DF, p-value: < 2.2e-16
Code
lmbio12 <-lm(SR ~ MAP, data = fdata)summary(lmbio12)
Call:
lm(formula = SR ~ MAP, data = fdata)
Residuals:
Min 1Q Median 3Q Max
-125.633 -13.056 -3.016 12.382 115.190
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.429e+01 1.236e+00 11.57 <2e-16 ***
MAP 2.471e-02 7.053e-04 35.03 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.16 on 1644 degrees of freedom
Multiple R-squared: 0.4274, Adjusted R-squared: 0.4271
F-statistic: 1227 on 1 and 1644 DF, p-value: < 2.2e-16
Which environmental variable is more related with Furnariides richness?
Answer: Both MAT and MAP are positively associated with the number of species. However, MAP shows a more steeper association than MAT. Also, MAP (R^2 = 0.43) explain more variance than MAT (R^2 = 0.24).
Please explain the relationship from an ecological perspective
Answer: The positive association between MAP and MAT with the number of species suggests that wetter and hotter areas contain more species. This might suggest that these areas present high local heterogeneity, allowing more species to coexist or to accumulate over time.
This paragraph was extracted entirely from (F. Dormann et al., 2007): The analysis of spatial data is complicated by a phenomenon known as spatial autocorrelation. Spatial autocorrelation (SAC) occurs when the values of variables sampled at nearby locations are not independent from each other (Tobler, 1970). The causes of spatial autocorrelation are manifold, but three factors are particularly common: 1) biological processes such as speciation, extinction, dispersal or species interactions are distance‐related; 2) non‐linear relationships between environment and species are modelled erroneously as linear; 3) the statistical model fails to account for an important environmental determinant that in itself is spatially structured and thus causes spatial structuring in the response (Besag, 1974). Since they also lead to autocorrelated residuals, these are equally problematic. A fourth source of spatial autocorrelation relates to spatial resolution, because coarser grains lead to a spatial smoothing of data. In all of these cases, SAC may confound the analysis of species distribution data.
We know that a correlation is not a causation, so, to explore the relationship we need to build a model or fit a model. To explore this relationships we will first explore a simple Ordinary Least Square regression or OLS.
Code
fols <-lm(SR ~ MAT + MAP, data = fdata)summary(fols)
Call:
lm(formula = SR ~ MAT + MAP, data = fdata)
Residuals:
Min 1Q Median 3Q Max
-106.713 -13.998 -2.648 11.496 123.965
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3665430 2.1168885 0.173 0.863
MAT 0.9307702 0.1159763 8.026 1.91e-15 ***
MAP 0.0208258 0.0008444 24.664 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 24.69 on 1643 degrees of freedom
Multiple R-squared: 0.449, Adjusted R-squared: 0.4484
F-statistic: 669.6 on 2 and 1643 DF, p-value: < 2.2e-16
Let’s complicate our model a little bit… Now let’s include the MRD values as covariable (Aka predictor).
Code
fols2 <-lm(SR ~ MAT + MAP + MRD, data = fdata) summary(fols2)
Call:
lm(formula = SR ~ MAT + MAP + MRD, data = fdata)
Residuals:
Min 1Q Median 3Q Max
-74.091 -10.068 -1.527 10.212 125.040
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 97.6603034 7.3781786 13.236 < 2e-16 ***
MAT 0.7196253 0.1109731 6.485 1.17e-10 ***
MAP 0.0133778 0.0009673 13.830 < 2e-16 ***
MRD -5.4480900 0.3975883 -13.703 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 23.39 on 1642 degrees of freedom
Multiple R-squared: 0.5056, Adjusted R-squared: 0.5047
F-statistic: 559.7 on 3 and 1642 DF, p-value: < 2.2e-16
What is telling us these two OLS models?
Answer: The first model, which includes MAT and MAP as predictors, suggests that species richness will increase as a function of increasing precipitation and temperature. This model explains 45% of the variation in species richness. It is important to highlight that despite the slopes being different from zero, the intercept does not, so this output should be considered exploratory.
The second model includes MRD as an additional covariable. By including MRD, the variation explained went up to 50%. So, including an evolutionary-informed covariable allowed us to explain better the variation in the number of species that co-occur in the Neotropics.
Now, explore the spatial autocorrelation of the Furnariides richness gradient. Spatial autocorrelation (it can also be temporal or phylogenetic) is a measure of similarity (correlation) between nearby observations. In other words, the spatial autocorrelation describe the degree two which observations (values) at spatial locations (whether they are points, areas, or raster cells), are similar to each other.
Let’s use a correlogram to explore the spatial autocorrelation. Remember, spatial autocorrelation (it can also be temporal or phylogenetic) is a measure of similarity (correlation) between nearby observations. Thus, high values means high spatial autocorrelation.
Is there a spatial autocorrelation in the data? Please explain your answer
Answer: Yes, we can see that close pixels are more similar in terms of the number of species than pixels that are farther apart. Notably, around the distance class 20, the positive spatial autocorrelation disappears.
What about the residuals? Let’s explore the spatial autocorrelation in the residuals.
x y
[1,] -108.5817 28.51778
[2,] -109.5817 27.51778
[3,] -108.5817 27.51778
[4,] -107.5817 27.51778
[5,] -108.5817 26.51778
[6,] -107.5817 26.51778
Build a neighborhood contiguity by distance. The distance used in this example is 1.5 degrees but you can try with a large distance if you wish to explore more models.
Code
# neighborhood contiguity by distancenb1.5<- spdep::dnearneigh(coords, 0, 1.5)
Using the neighborhood contiguity build spatial weights for neighbor lists.
How many distance classes are necessary to eliminate the spatial autocorrelation in the residuals
Answer: As in the case of species richness, the residuals also present high spatial autocorrelation at short distances, confirming that additional factors or processes are influencing the patterns we observe in nature. In other words, MAP, MAT, and MRD are insufficient to explain the variation in species richness.
Ohhh, seems that the residuals have a strong spatial autocorrelation, that is a problem because if we found autocorrelation in the residuals much of the explanation that we obtain can be biased. See explanation above.
Hmmm, seems that there is a strong spatial autocorrelation, thus any conclusion using the OLS model can be biased.
How do you feel about that?
Answer: Although simple linear models (OLS) can help us detect patterns to make inferences, we need to go at least one step ahead and explore the spatial structure of the data to better understand the factors and processes that generate the observed data.
To try to solve this important issue, we will use spatial simultaneous autoregressive error model estimation (Aka SAR model), this kind of models account for spatial autocorrelation by adding an extra term (autoregressive) in the form of a spatial-weight matrix that specifies the neighborhood of each cell or pixel and the relative weight of each neighbor.
Ok, now we know that the SAR model can solve the problem in the spatial autocorrelation in the residuals, let’s try to make some inferences.
Code
summary(sar_nb1.5.w)
Call:spatialreg::errorsarlm(formula = fols2, data = fdata, listw = nb1.5.w,
na.action = na.exclude, quiet = FALSE, zero.policy = TRUE)
Residuals:
Min 1Q Median 3Q Max
-45.6541 -2.6715 -0.4562 2.1783 52.8991
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 58.29989009 10.26207282 5.6811 1.338e-08
MAT -0.32648785 0.09576075 -3.4094 0.000651
MAP 0.00186297 0.00061516 3.0284 0.002458
MRD -0.76025999 0.27606052 -2.7540 0.005888
Lambda: 0.97712, LR test value: 2966.3, p-value: < 2.22e-16
Asymptotic standard error: 0.0044266
z-value: 220.74, p-value: < 2.22e-16
Wald statistic: 48724, p-value: < 2.22e-16
Log likelihood: -6039.454 for error model
ML residual variance (sigma squared): 68.117, (sigma: 8.2533)
Number of observations: 1646
Number of parameters estimated: 6
AIC: 12091, (AIC for lm: 15055)
Code
summary(fols2)
Call:
lm(formula = SR ~ MAT + MAP + MRD, data = fdata)
Residuals:
Min 1Q Median 3Q Max
-74.091 -10.068 -1.527 10.212 125.040
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 97.6603034 7.3781786 13.236 < 2e-16 ***
MAT 0.7196253 0.1109731 6.485 1.17e-10 ***
MAP 0.0133778 0.0009673 13.830 < 2e-16 ***
MRD -5.4480900 0.3975883 -13.703 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 23.39 on 1642 degrees of freedom
Multiple R-squared: 0.5056, Adjusted R-squared: 0.5047
F-statistic: 559.7 on 3 and 1642 DF, p-value: < 2.2e-16
By looking to the summary of the SAR and OLS models, explain the differences in the coefficients between both models.
Answer: The first difference is that the coefficients (intercepts and slopes) changed in strength and direction. More specifically, the covariable MAT that was positive now for the OLS model, under the SAR model is negative. The effect of MAP and MRD also decreased.
Let’s compare the slopes for both models in a similar way we did when we compared OLS versus PGLS. Let’s start with mean annual temperature.
Code
fdata %>%ggplot(aes(x = MAT, y = SR)) +geom_point(alpha =0.5, color ="darkgray") +labs(x ="Mean annual temperature", y ="Species richness") +geom_smooth(method ="lm", # coefficient OLSse =FALSE, color ="blue", linewidth =2) +geom_abline(intercept =coef(sar_nb1.5.w)[2], # coefficients SARslope =coef(sar_nb1.5.w)[3], color ="red", linewidth =2)
`geom_smooth()` using formula = 'y ~ x'
What about precipitation.
Code
fdata %>%ggplot(aes(x = MAP, y = SR)) +geom_point(alpha =0.5, color ="darkgray") +labs(x ="Annual precipitation", y ="Species richness") +geom_smooth(method ="lm", # coefficient OLSse =FALSE, color ="blue", linewidth =2) +geom_abline(intercept =coef(sar_nb1.5.w)[2], # coefficients SARslope =coef(sar_nb1.5.w)[4], color ="red", linewidth =2)
`geom_smooth()` using formula = 'y ~ x'
And finally mean root distance
Code
fdata %>%ggplot(aes(x = MRD, y = SR)) +geom_point(alpha =0.5, color ="darkgray") +labs(x ="Mean root distance", y ="Species richness") +geom_smooth(method ="lm", # coefficient OLSse =FALSE, color ="blue", linewidth =2) +geom_abline(intercept =coef(sar_nb1.5.w)[2], # coefficients SARslope =coef(sar_nb1.5.w)[5], color ="red", linewidth =2)
`geom_smooth()` using formula = 'y ~ x'
By inspecting the figures, we can see that sometimes, our conclusions about the factors influencing the patterns of biodiversity can change depending on how the data is analyzed.
The last exercise of this tutorial is to compare the prediction of both models. To calculate a R2 to the SAR model, we will use the function SARr2() from Jesús’s GitHub.
SARr2(Lfull = sar_nb1.5.w$LL[[1]], # log likelihood of the model that includes the spatial weightsLnull = sar_nb1.5.w$logLik_lm.model[[1]], # log likelihood null modelN =nrow(fdata) # number of observation )
[1] 0.8350546
Comparing the two models (OLS and SAR), please answer the following questions:
1. Which model have the best explanation?
Answer: The SAR model that includes the spatial structure of the data explain 84% of the variation in species richness. In contrast the OLS model only 50%.
2. What can we conclude from these results?
Answer: Although the covariables used in both models (SAR and OLS) help us to explain the variation in species richness, we can confirm that additional factors masked in the spatial structure of the data explain a high amount of variation.
3. Do the slopes (betas) change from the OLS to the SAR? What about the R2?
Answer: Yes, the coefficients and the R^2 changed drastically by taking into account the spatial structure of the data.
4. How do you feel about that?
Answer: Although simple models are useful for data exploration, if our ultimate goal is to make inferences about the processes that generate the data, then models that account for the structure of the data are necessary.
The end! for now…
References
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 192–225. https://doi.org/10.1111/j.2517-6161.1974.tb00999.x
Brown, J. H. (2014). Why are there so many species in the tropics? Journal of Biogeography, 41(1), 8–22. https://doi.org/10.1111/jbi.12228
Currie, D. J., Mittelbach, G. G., Cornell, H. V., Field, R., Guegan, J.-F., Hawkins, B. A., Kaufman, D. M., Kerr, J. T., Oberdorff, T., O’Brien, E., & Turner, J. R. G. (2004). Predictions and tests of climate-based hypotheses of broad-scale variation in taxonomic richness. Ecology Letters, 7(12), 1121–1134. https://doi.org/10.1111/j.1461-0248.2004.00671.x
F. Dormann, C., M. McPherson, J., B. Araújo, M., Bivand, R., Bolliger, J., Carl, G., G. Davies, R., Hirzel, A., Jetz, W., Daniel Kissling, W., Kühn, I., Ohlemüller, R., R. Peres-Neto, P., Reineking, B., Schröder, B., M. Schurr, F., & Wilson, R. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: A review. Ecography, 30(5), 609–628. https://doi.org/10.1111/j.2007.0906-7590.05171.x
Fine, P. V. A. (2015). Ecological and evolutionary drivers of geographic variation in species diversity. Annual Review of Ecology, Evolution, and Systematics, 46(1), 369–392. https://doi.org/10.1146/annurev-ecolsys-112414-054102
Hawkins, B. A., Field, R., Cornell, H. V., Currie, D. J., Guégan, J.-F., Kaufman, D. M., Kerr, J. T., Mittelbach, G. G., Oberdorff, T., O’Brien, E. M., Porter, E. E., & Turner, J. R. G. (2003). ENERGY, WATER, ANDBROAD-SCALEGEOGRAPHICPATTERNSOFSPECIESRICHNESS. Ecology, 84(12), 3105–3117. https://doi.org/10.1890/03-8006
Hawkins, B. A., McCain, C. M., Davies, T. J., Buckley, L. B., Anacker, B. L., Cornell, H. V., Damschen, E. I., Grytnes, J.-A., Harrison, S., Holt, R. D., Kraft, N. J. B., & Stephens, P. R. (2012). Different evolutionary histories underlie congruent species richness gradients of birds and mammals: Bird and mammal richness gradients. Journal of Biogeography, 39(5), 825–841. https://doi.org/10.1111/j.1365-2699.2011.02655.x
Kerr, J. T., & Currie, D. J. (1999). The relative importance of evolutionary and environmental controls on broad-scale patterns of species richness in north america. Écoscience, 6(3), 329–337. https://doi.org/10.1080/11956860.1999.11682546
Pianka, E. R. (1966). Latitudinal gradients in species diversity: A review of concepts. The American Naturalist, 100(910), 33–46. https://doi.org/10.1086/282398
Pinto-Ledezma, J. N., Simon, L. M., Diniz-Filho, J. A. F., & Villalobos, F. (2017). The geographical diversification of furnariides: The role of forest versus open habitats in driving species richness gradients. Journal of Biogeography, 44(8), 1683–1693. https://doi.org/10.1111/jbi.12939